A Dempster-Shafer Model for Feature Selection in Text Categorization
نویسندگان
چکیده
منابع مشابه
Joint Feature Transformation and Selection Based on Dempster-Shafer Theory
In statistical pattern recognition, feature transformation attempts to change original feature space to a low-dimensional subspace, in which new created features are discriminative and non-redundant, thus improving the predictive power and generalization ability of subsequent classification models. Traditional transformation methods are not designed specifically for tackling data containing unr...
متن کاملMMR-based Feature Selection for Text Categorization
We introduce a new method of feature selection for text categorization. Our MMR-based feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results show that MMR-based feature selection is more effective than Koller & Sahami’s method, which is one of greedy feature selection ...
متن کاملSegmentation-based Feature Selection for Text Categorization
Text categorization is an interesting problem in artificial intelligence that gets more and more attention from researchers and industry. One central problem of text categorization is the selection of a good feature set. We propose a novel method for term selection for each category based on segmenting the documents belonging to a category into cohesive sub-parts that define the subtopics of th...
متن کاملFeature Selection in SVM Text Categorization
This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization. The input space was gradually increased by using mutual information (MI) filtering and part-of-speech (POS) filtering, which determine the portion of words that are appropriate for learning from the information-theoretic and the linguistic perspectives, respectively. We tested the ...
متن کاملFeature Selection and Feature Extract ion for Text Categorization
The effect of selecting varying numbers and kinds of features for use in predicting category membership was investigated on the Reuters and MUC-3 text categorization data sets. Good categorization performance was achieved using a statistical classifier and a proportional assignment strategy. The optimal feature set size for word-based indexing was found to be surprisingly low (10 to 15 features...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Research Journal of Applied Sciences, Engineering and Technology
سال: 2014
ISSN: 2040-7459,2040-7467
DOI: 10.19026/rjaset.7.347